Goto

Collaborating Authors

 data custodian


End to End Collaborative Synthetic Data Generation

Pentyala, Sikha, Sitaraman, Geetha, Claar, Trae, De Cock, Martine

arXiv.org Artificial Intelligence

The success of AI is based on the availability of data to train models. While in some cases a single data custodian may have sufficient data to enable AI, often multiple custodians need to collaborate to reach a cumulative size required for meaningful AI research. The latter is, for example, often the case for rare diseases, with each clinical site having data for only a small number of patients. Recent algorithms for federated synthetic data generation are an important step towards collaborative, privacy-preserving data sharing. Existing techniques, however, focus exclusively on synthesizer training, assuming that the training data is already preprocessed and that the desired synthetic data can be delivered in one shot, without any hyperparameter tuning. In this paper, we propose an end-to-end collaborative framework for publishing of synthetic data that accounts for privacy-preserving preprocessing as well as evaluation. We instantiate this framework with Secure Multiparty Computation (MPC) protocols and evaluate it in a use case for privacy-preserving publishing of synthetic genomic data for leukemia.


Introducing Bitfount

#artificialintelligence

Bitfount is a federated analytics and machine learning platform that makes extracting value from sensitive data easy, fast, private, and secure. For data custodians and data scientists or researchers partnering to achieve better insights from data, Bitfount combines the best of data collaboration design, with advanced privacy-preserving capabilities, while playing nicely with all of your existing tools and crucially not requiring the transfer of any raw data. Data collaboration today is a painful, messy business. As anyone who has attempted to set up a collaboration around sensitive data will know, the current process is generally very painful and slow. Valuable datasets languish in silos as a result of regulatory or commercial sensitivity concerns, incompatible data management solutions, lengthy contractual processes, or just plain lack of understanding of what data is available for which purposes within an organisation.


Microsoft has high hopes for Australian government's big data

#artificialintelligence

Microsoft wants the Australian government to close loopholes in proposed data sharing-and-release legislation that it believes could be used to shut down or limit access to data without explanation. The software giant used a submission [pdf] to a Prime Minister & Cabinet-led consultation to outline concerns that data could be too easily withheld or not offered in the first place, despite assertions that "much of the Australian government's data is not personal or sensitive". Microsoft suggested that Australian laws should, in part, mimic the EU's reuse of public sector information directive, which requires agencies to explain why they deny access to data. "We note that the proposed process for sharing data does not appear to require Commonwealth data custodians to provide an explanation either when denying a data access request, or if they decide not to provide open access to data in the first instance," Microsoft said. "[We] suggest that the bill require data custodians to provide such an explanation.